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Abstract 

We consider asynchronous communication over point-to-point discrete memoryless channels without feedback. 
The transmitter starts sending one block codeword at an instant that is uniformly distributed within a certain time 
period, which represents the level of asynchronism. The receiver, by means of a sequential decoder, must isolate the 
message without knowing when the codeword transmission starts but being cognizant of the asynchronism level. 
We are interested in how quickly can the receiver isolate the sent message, particularly in the regime where the 
asynchronism level is exponentially larger than the codeword length, which we refer to as 'strong asynchronism.' 

This model of sparse communication might represent the situation of a sensor that remains idle most of the 
time and, only occasionally, transmits information to a remote base station which needs to quickly take action. 
Because of the limited amount of energy the sensor possesses, assuming the same cost per transmitted symbol, it 
is of interest to consider minimum size codewords given the asynchronism level. 

The first result is an asymptotic characterization of the largest asynchronism level, in terms of the codeword 
length, for which reliable communication can be achieved: vanishing error probability can be guaranteed as the 
codeword length TV tends to infinity while the asynchronism level grows as e Na if and only if a does not exceed 
the synchronization threshold, a constant that admits a simple closed form expression, and is at least as large as 
the capacity of the synchronized channel. 

The second result is the characterization of a set of achievable strictly positive rates in the regime where 
the asynchronism level is exponential in the codeword length, and where the rate is defined with respect to the 
expected (random) delay between the time information starts being emitted until the time the receiver makes a 
decision. Interestingly, this achievability result is obtained by a coding strategy whose decoder not only operates 
in an asynchronously, but has an almost universal decision rule, in the sense that it is almost independent of the 
channel statistics. 

As an application of the first result we consider antipodal signaling over a Gaussian additive channel and 
derive a simple necessary condition between blocklength, asynchronism level, and SNR for achieving reliable 
communication. 

Index Terms 

Asynchronous communication, detection and isolation problem, discrete-time communication, error exponent, 
low probability of detection, point-to-point communication, quickest detection, sequential analysis, sparse commu- 
nication, stopping times 



I. Introduction 

A common assumption in information theory is that 'whenever the transmitter speaks the 
receiver listens.' In other words, in general, there is the assumption of perfect synchronization 
between the transmitter and the receiver and, basic quantities, such as the channel capacity, are 
defined under this hypothesis lfl3l . In practice this assumption is rarely fulfilled. Time uncertainty 
due, for instance, to bursty sources of information often causes asynchronous communication, 
i.e., communication for which the receiver has only a partial knowledge of when information is 
sent. 
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Fig. 1. Communication is carried over a discrete memoryless channel. When 'no information' is sent the input of the channel is the '*' 
symbol. 

There are, however, notable channels for which asynchronism effects have been studied from 
an information theoretic standpoint. An example is the multiple access channel (see, e.g., 0, 
flU, lfl2ll . ifToTO for which the capacity region has been computed under various assumptions on 
the users' asynchronism. Another important example is the insertion, deletion, and substitution 
channel for which only bounds on the capacity are known (see, e.g., 0J, CZL IHL 10). 

In this paper we propose an information theoretic framework that models users' asynchronism 
for point-to-point discrete-time communication without feedback. We consider the situation 
where the transmitter may start sending information at a time unknown to the receiver. The 
time transmission starts is assumed to be uniformly distributed within a certain interval, which 
defines the asynchronism level between the transmitter and the receiver. A suitable notion of 
rate is introduced and scaling laws between block message size and asynchronism level are 
given for which reliable communication can or cannot be achieved^ Our first result is the 
characterization of the highest asynchronism level with respect to the codeword length under 
which reliable communication can still be achieved. This limit is attained by a coding strategy 
that operates at vanishing rate. This strategy also allows for communication at positive rates 
while operating at asynchronism levels that are exponentially larger than the codeword length. 

In Section |n] we formally introduce our model and draw connections with the related 
'detection and isolation' problem in sequential analysis. Section |ni] contains our main results, 
Section [IV] is devoted to the proofs, and we end with final remarks in Section [V] The proofs 
make often use of large deviations type bounding techniques for which we refer the reader to 
[S Chapters 1.1 and 1.2] or [4, Chapter 12]. 

II. Problem formulation and background 

We consider discrete-time communication over a discrete memoryless channel characterized 
by its finite input and output alphabets X and y, respectively, transition probability matrix 
Q(y\x), for all y G y and x G X, and 'noise' symbol * E X (see Fig. [T])ll The codebook 
consists of M > 2 equally likely codewords of length N composed of symbols from X — 
possibly also the * symbol. The transmission of a particular codeword starts at a random time 
z/, independent of the codeword to be sent, uniformly distributed in [1, 2, . . . , A], where the 
integer A > 1 characterizes the asynchronism level. We assume that the receiver knows A but 
not v. If A = 1 the channel is said to be synchronized. Throughout the paper, whenever we 
refer to the capacity of a channel, it is intended to be the capacity of the synchronized channel. 
Throughout the paper we only consider channels Q with strictly positive capacity C(Q). 

Before and after the transmission of the information, i.e., before time v and after time v + 
N — l, the receiver observes noise. Specifically, conditioned on the value of v and on the message 
to be conveyed m, the receiver observes independent symbols Yi,Y 2 , . . . distributed as follows. 
If i<v — I or i>v + N, the distribution is Q{- At any time i E [v, v + 1, . . . , v + N — 1] 

'We refer to 'reliable communication' whenever arbitrary low error probability can be achieved, 
throughout the paper we always assume that for all y G y there is some x £ X for which Q(y\x) > 0. 
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Fig. 2. Time representation of what is sent (upper arrow) and what is received (lower arrow). The '*' represents the 'noise' symbol. At 
time v message m starts being sent and decoding occurs at time r. 



the distribution is Q(-|c i _ l/+ i(m)), where c n (m) denotes the nth symbol of the codeword c N (m) 
assigned to message m. 

The decoder consists of a sequential test (r, </>), where r is a stopping time with respect 
to the output sequence Yi,Y 2 , . . c| indicating when decoding happens, and where (f) denotes a 
decision ru that declares the decoded message (see Fig. 

We are interested in reliable and quick decoding. To that aim we first define the average 
decoding error probability as 



MA 



where £ indicates the event that the decoded message does not correspond to the sent message, 
and where the subscripts m j indicate the conditioning on the event that message m starts being 
sent at time I. Second, we define the average communication rate with respect to the average 
delay it takes the receiver to react to a sent message, i.e. 

InM 



with 
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where x + denotes max{0,a:}, and where E m j denotes the expectation with respect to P m .;@ 
With the above definitions we now introduce the notion of achievable rate with respect to a 
certain asynchronism level as well as the notion of synchronization threshold. 
Definition 1. An asynchronism exponent a is achievable at a rate R if, for any e > 0, there 
exists a block code with (sufficiently large) codeword length N, operating under asynchronism 
level A = e( a - £ ) N 7 while yielding a rate at least as large as R — e and an error probability 
P(£) < e. The supremum of the set of asynchronism exponents that are achievable at rate R is 
denoted a(R, Q). 

Note that, for a given channel Q, the asynchronism exponent function a(R, Q) is non-increasing 
in R. 

Definition 2. The synchronization threshold of a channel Q, denoted by a(Q), is the supremum 
of the set of achievable asynchronism exponents at all rates, i.e., a(Q) = a(R = 0,Q). 



3t 



Recall that a stopping time r is an integer-valued random variable with respect to a sequence of random variables {Yi}°^ 1 so that the 
event {r = n}, conditioned on {Yi}™ =1 , is independent of {Yi}°jL n+1 for all n > 1. 

4 Formally <f> is an T T -measurable map where T\,T%,... is the natural filtration induced by the process Yi,Yn,... 
5 In our model one message is sent in a certain interval with probability one. An interesting extension of this model that we did not consider 
is to give some probability to the event where no message is sent. The receiver knows that with some probability 1 — p a message starts 
being sent within a certain interval and that with probability p no message is sent. 
6 Here In denotes the natural logarithm. 
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Throughout the paper we often use the terminology 'coding strategy' or 'coding scheme' to 
denote an infinite sequence of pairs codebook/decoder labeled by the blocklength. In particular, 
whenever we refer to a coding strategy that 'achieves a certain rate,' it is intended to be 
asymptotically in the limit N — ► oo. 

Let us comment on the above bursty communication model and its associated notions of rate 
and synchronization threshold. First observe that we do not introduce a feedback channel from 
the receiver to the transmitter. With a noiseless feedback it is possible to inform the transmitter 
of the receiver's decoding time, say in the form of ack/nack, therefore allowing the sending of 
multiple messages instead of just one as in our model. Here the noiseless assumption is crucial. 
If the feedback is noisy, the receiver's decision may be wrongly recognized by the transmitter, 
which possibly may result in a loss of message synchronization between transmitter and receiver 
(say the receiver hasn't yet decoded the first message while the transmitter has already started 
to emit the second one). Therefore, in order to avoid a potential second source of asynchronism, 
we omit feedback in our study and limit transmission to only one message. 

The reason for defining the rate with respect to the average delay E(r — u) + (see (Q])) is 
motivated by the following considerations. At first sight, a natural measure of delay may be the 
codeword length N. However, in light of the use of sequential decoding, the codeword length 
does not provide a measure of the delay needed for the information to be reliably decoded. 
Another candidate for the delay one might consider is E(t) or, equivalently, Ei/ + E(r — v). The 
fact that this delay takes into account the initial offset Ei> can be regarded as a weakness since 
this offset can be influenced neither by the transmitter nor by the receiver. Also, with such a 
delay measure, in the regime of positive asynchronism exponents we are interested in, the rate 
is always (asymptotically) vanishing for any reliable coding strategy^] Instead, we propose to 
consider E(r — v) + , the average time the transmitter needs to wait until the receiver makes a 
decision. Also note that, in the definition of achievable rate (Definition [B, we choose to grow 
A with N. Indeed, when A is fixed the problem becomes trivial. By using sufficiently long 
codewords and simply decoding at the (fixed) time A + N — 1 the asynchronism effect on the 
rate can be made negligible. 

We now briefly discuss the notion of synchronization threshold. This threshold is defined 
with respect to zero rate coding strategies, that is strategies for which lnM/E(r — u) + tends to 
zero (as N — > oo). However, because E(r — u) + and N need not coincide in general, zero rate 
coding strategies need not, in general, yield a vanishing fraction \nM/N as iV tends to infinity. 
Indeed, as we will see, one can operate arbitrarily closely to the synchronization threshold while 
having In M/iV asymptotically bounded away from zero. 

Perhaps the closest sequential decision problem our model relates to is a generalization 
of the change-point problem, often called the 'detection and isolation problem,' introduced by 
Nikiforov in 1995 (see ifTTI . ifTOl and [|2) for a survey). A process Y\,Y 2 , . . . starts with some 
initial distribution and changes it at some unknown time. The post change distribution can be 
any of a given set of M distributions. By sequentially observing Y 1 ,Y 2 , . . . the goal is to quickly 
react to the statistical change and isolate its cause, i.e., the post-change distribution. Hence, our 
synchronization problem takes the form of a detection and isolation problem where the change 
in distribution is induced by the transmitted message. However, to the best of our knowledge 
studies related to the detection and isolation problem usually assume that once the observed 
process jumps into one of its post-change distributions, it remains in that state forever. This 
means that, eventually, if we wait long enough, a correct decision is be possible. Instead, in the 

7 To see this consider the rate defined as In Mj (Ku + E(r — f ))■ To achieve vanishing error probability as M (or N) tends to infinity, the 
reaction delay E(r — v) must grow at least linearly with In M (if not this would imply that reliable communication above capacity would 
be possible). Similarly, M and TV must satisfy TV > In M. Also, in the regime of positive asynchronism exponents, i.e., when A = e Na 
for some a > 0, we have Ek = e Na /2 since v is uniformly distributed in [1,2,..., A], Therefore, in the regime of positive asynchronism 
exponents, the rate lnM/fEf + E(r — u)) is vanishing as TV" — > oo for any coding strategy that achieves arbitrarily low error probability. 



5 



synchronization problem the change in distribution is local since it only lasts the duration of a 
codeword length. In particular once the codeword is 'missed' no recovery is possible. Finally, 
optimal decoding rules for the detection and isolation problem seem to have been obtained 
only in the limit of small error probabilities P(£) while keeping M, the number of post-change 
distributions, fixed@ In our case we typically let M grow as (1/P(£)) ? , for some £ > 0. 

III. Results 

Our first result is the characterization of the synchronization threshold. 
Theorem 1. For any discrete memoryless channel Q, the synchronization threshold as given in 
Definition [2] is given by 

a(Q) = maxD(Q(-\x)\\Q{-\*)) 

X 

where D(Q(-\x)\\Q(-\*)) is the divergence (Kullback-Leibler distance) between Q(-\x) and 
Q(-\*). Furthermore, any synchronization threshold a < a(Q) can be achieved by a coding 
strategy that yields lim^v^oo hiM/N > 0. 

The theorem says that vanishing error probability can be achieved as the blocklength N tends to 
infinity if the asynchronism level grows as e Na where a < D(Q(-\x)\\Q(- Conversely, any 
coding strategy that operates at an asynchronism exponent a > D(Q(- \x) \\Q(- 1*)) cannot achieve 
arbitrary low error probability. The second part of the theorem shows the distinction between 
the delay measured by the codeword length N and by the expected 'reaction time' E(r — v) + . 
Arbitrary closely to the synchronization threshold one can (asymptotically) guarantee InM/N 
to be strictly positive, while the question remains open for the rate In M/E(r — v) + . Specifically, 
it remains to be seen whether a(Q) = limj^ a(R, Q) (assuming at(Q) < oo). This issue will 
be discussed in Section IIII-BI 

At least some connections between channel capacity and synchronization threshold exist. 
Although these two quantities are not directly related, both refer to limits on hypothesis dis- 
crimination. The first concerns a purely isolation problem whereas the second concerns an almost 
purely detection problem (since there is no rate constraint). It may be interesting to note that 
the synchronization threshold a(Q) is always at least as large as C(Q). To see this let P be the 
capacity achieving distribution of the (synchronized) channel Q. It is well known [4, Lemma 
13.8.1] that for any distribution V on y 

D(PQ\\PP Y ) < D(PQ\\PV) 

where Py is the right marginal of PQ = P(-)Q(-\-). Letting V = Q(-\*) we get 

C(Q)±D(PQ(-\-)\\PP Y ) 

< D(PQ(-\-)\\PQ(-\*)) 

x y 

<maxD(Q(.|z)||Q(.|*)) 

X 

= a(Q) 

Finally it can be checked that if C(Q) = then a(Q) = 0. 

8 Here optimal decoding rules refer to sequential tests yielding minimum reaction delay, usually a function of r — v, given a certain error 
probability. 



Fig. 3. Antipodal signaling over a Gaussian channel with hard decision at the decoder. 



Example: the Gaussian channel 

As an application of Theorem Q] we consider antipodal signaling over a Gaussian channel and 
derive a necessary condition between asynchronism level, block length, and signal to noise 
ratio (SNR) for achieving reliable communication. Suppose communication takes place over an 
additive channel X — > Y = X + Z where X denotes the input, Y the output, and where Z is 
a normally distributed random variable, independent of X, with zero mean and unit variance. 
We consider antipodal signaling, that is Cj(m) = ±VSNR for all i G {1,2,..., N} and m G 
{1, . . . , M}, where the SNR is some positive constant. Before decoding, the receiver makes 
a hard decision on each received symbol and declares +1 if Y, t > and —1 if Y, t < 0. The 
noise symbol * equals zero meaning that when no information is sent the receiver declares 
+1 or —1 with probability 1/2. The inputs +v / SNR and — v / SNR are received correctly with 
probability 1 — e and are flipped with probability e, where e = e~~^ 1+ °^ as the SNR tends 
to infinity. The discrete channel Q that results from the hard decision procedure is depicted in 
Fig. [3j From Theorem [H any coding strategy that yields vanishing error probability satisfies 
limsup^^^ l/N\nA < a(Q) where 

a{Q) = maxD(Q(.|a:)||Q(.|*)) 

X 

= \n2-H{e) 

= \n2-H(e-^ 1+ °^) as SNR -> oo 

with H(e) = — e \ne — (1 — e) ln(l — e). Therefore, as N tends to infinity, in order to achieve 
reliable communication it is necessary that 

— In A < In 2 - #( e -^ (1+ ° l(1)) ) + o 2 (l) 

where Oi(l) and o 2 (l) are vanishing functions of the SNR and of N, respectively. Because of 

the chosen quantization, in the limit of high SNR we have 4 In A < In 2, and an increase in the 
power results in a negligible increase of the asynchronism level for which reliable communication 
is possible (for fixed blocklength). To exploit power at high SNR it is necessary to have a finer 
quantization at the output. Finally notice that for this (quantized) channel the synchronization 
threshold coincides with the channel capacity. □ 

While we do not characterize the asynchronism exponent function a(R, Q) for R > 0, The- 
orem [2] provides a non trivial lower bound characterization of a(R, Q), for any R e [0, C(Q)). 

We use the notation (PQ) Y to denote the right marginal of a joint distribution P(-)Q(-\-) 
and, given a joint distribution J on X x y we denote by I (J) the mutual information induced 
by J. Also we denote by V y \ x the set of conditional distributions of the form V(y\x) with 
x G X and y G y. 
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Theorem 2. Let Q be a discrete memoryless channel. If for some constants a > 0, t\ > 0, 
t 2 > 1, and input distribution P, with I(PQ) > 0, the following inequalities 

a. a< inf. D{{PV)y\\{PQ)y) 

b. a< min D(PV\\PQ) 

h D((PQ) Y \\Q(-\*)) 
°' h I(PQ) 

are satisfied for some 5 G (0,1), then the rate I(PQ)/t 2 is achievable at an asynchronism 
exponent a. 

Note that the conditions a and b in Theorem [2] are easy to check numerically since they only 
involve convex optimizations. Also notice, on the right hand side of the inequality b, the sphere 
packing exponent function — of the channel Q with input distribution P — evaluated at gu^^zfj 
(see [5, p. 166]). 

Corollary. For any channel Q with capacity C{Q) > 0, any rate R G (0, C{Q)) can be achieved 
at a strictly positive asynchronism exponent. 

Proof of the Corollary: Consider the inequalities a, b, and c from Theorem [2l First choose 
some P and t 2 > 1 so that I(PQ)/t 2 > R and so that (PQ)y ^ (this is always possible 

since C(Q) > 0). By setting ti = the inequality c holds (since its right hand side is strictly 
positive). Also inequality a holds for any finite a (the infimum equafs infinity). For the inequality 
b, observe that its right hand side is a decreasing function of a and has a strictly positive value 
at a = (since I(PQ) > 0). It follows that inequality b holds for strictly positive and small 
enough values of a. ■ 



A. Coding for asynchronous channels 

In this section we present the coding scheme from which one deduces Theorem [2] and the 
direct part of Theorem [Q As we will see, our scheme does not subdivide the synchronization 
problem into a detection problem followed by a message isolation problem: detection and 
isolation are treated jointly. 

The codebook is randomly generated according to some distribution P. If the aim is only to 
reliably communicate at a certain asynchronism exponent a, there is some degrees of freedom 
in choosing P. One possible choice is to pick a P that satisfies 

D{(PQ) Y \\Q{-\*)) + I(PQ) - In M/N > a 

with D((PQ)y | |Q(-|*)) > an d I{PQ) > 0, where M represents the size of the message 
set and iV the size of the codewords (see proof of Proposition [2]). In the regime where the 
asynchronism exponent is close to a(Q) the codewords are mainly composed of the symbol 
argmax x D(Q(-\x)\\Q (■]*)). Indeed, in this asynchronism regime, the main source of error 
comes from a miss detection of the sent codeword, later referred to as 'false-alarm.' We deal 
with this source of error by distillating information using codewords with (mostly) symbols that 
induce output distributions that are 'as far as possible' from the output distribution induced by 
the * symbol. Finally if the aim is to accommodate both rate and asynchronism constraints, the 
distribution P has to satisfy the conditions explicitly stated in Theorem [2l 

For the decoder, let us observe first that our communication model admits two sources 
of error. The first comes from an atypical behavior of the noise during the period when no 
information is conveyed, which may result in a false-alarm. The second comes from an atypical 



8 



behavior of the channel during information transmission, which may result in a miss-isolation 
of the sent codeword. These two sources of error depend on the asynchronism level as well as 
on the communication rate: the higher the asynchronism the higher the first source of error, the 
higher the communication rate the higher the second source of error. Accordingly, our decoder 
is the combination of two criteria parameterized by constants that are chosen based on the level 
of asynchronism and according to the rate we aim at. 

More specifically, the decoder observes the channel outputs Yi, Y 2 , . . . and makes a decision 
as soon as it observes i consecutive output symbols, with i E [1, 2, ... , N], that simultaneously 
satisfy two conditions. The first condition is that these symbols should look 'sufficiently different' 
from the noise, as measured by the divergence. The second condition is that these symbols must 
be sufficiently correlated, in a mutual information sense, with one of the codewords. We formalize 
this below. 

For j > i we write x\ for Xi, Xj+i, . . . , Xj. If % — 1 we use the shorthand notation x- 7 instead 
of x\. Given a pair (x n ,y n ) let us denote by Pi x n >y n,\ the empirical distribution of (x n ,y n ), i.e., 



P( x n >yn) (x,y) = \ YH=xHx, y ){xi,yi) where i.( x , v ){Fi,Vi) = 1 if 
zero. To each message m G [1, 2, . . . , M] associate the stopping tim 



(x, y), else equals 



r m = inf<{n>l : Bie{l,...,N} so that iD(Py™ \\Q(-\*)) > t x InM and 



mm 

fce[i,...,il L 



kI(P ck 



c K [m),y .j. 



+*) + (i - k)I{P t 



cl +1 {m),yl_ l+k+1 , 



>t 2 lnM 



(4) 



where t% > and t 2 > 1 are some fixed threshold constants to be appropriately chosen according 
to the asynchronism level and desired communication rate. The decoding is made at time 



T 



mm t„ 

me[l,2,...,M] 



and the message m that is declared is any that satisfies r r - n = r. 

It should be emphasized that there may be other sequential decoders that also achieve 
the synchronization threshold. The one we propose has the property that it also allows for 
communication at positive rates and positive asynchronism exponents. Also, an interesting feature 
of the above decoder is that, in addition to operating in an asynchronous setting, it is also almost 
universal in the sense that its rule does not depend of the channel statistics, except for the 
noise distribution Q(-\*). In fact this decoder is an extension of a sequential universal decoder 
introduced in [fT51 eq. (10)] for the synchronized setting. 

In the context of asynchronous communication, the same decoding rule as above is considered 
in [fT4l . but without the divergence condition, i.e., a decision is made as soon as for some m 
and i the condition 



mm 

fce[l,...,i] 



'it may seem to the reader that the mutual information condition in $4$ given by 

min \kI(P k , , „-i+k) + (i - k)I(P r i (m \ „» )) > t 2 lnM 



fce[i,...,»] 

is convoluted, and that it could be replaced, for instance, by 

U(P C 



) > t 2 In M 



(2) 



(3) 



Our choice is motivated by a technical consideration related to the false-alarm event induced by i last symbols that are generated partly 
inside and partly outside the transmission period (see Case II of the proof of Lemma |2). 
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Fig. 4. A binary symmetric channel has a sphere packing bound at zero rate, E sp (R — 0, Q) given by maxp min V: ](pv)^o D(PV\\PQ), 
that can be smaller compared to a(Q). Specifically, Theorem Q] yields a(Q) = e ln[e/(l — e)] + (1 — e) ln[(l — e)/e] and it can be checked 
that E sp (R = 0,Q) < 0.5 ln[0.5/(l - e)] + 0.5 ln[0.5/e]. Therefore E sp (R = 0, Q) < 0.5(1 + o(l))a(Q) as e -> 0. 




Fig. 5. Example of a channel for which a(Q, 0) = lim_R|o a{Q, R)- 



holds. With the mutual information condition alone, however, it was not possible to prove that 
reliable communication can be achieved for asynchronism exponents higher than the capacity 
of the channel. 

B. Continuity of a(-,Q) at R = 

We discuss the continuity of a(-, Q) at R = in light of Theorem [2l The right hand side 
of inequality b, the sphere packing bound, is associated to the miss-isolation error event of the 
sent codeword associated with the coding scheme discussed in IIII-AI (this will be seen in the 
proof of Theorem 12). Therefore, regardless of the rate, any achievable synchronization exponent 
a obtained via Theorem [2] is bounded by the sphere packing exponent at zero rate, which can 
be smaller than the synchronization threshold (see Fig. |4] for an example). This motivates the 
conjecture that a(Q, 0) ^ lim^ a {Qi R) m general. 

Note that there are channels for which the asynchronism exponent function is continuous at 
zero rate, such as the one given in Fig. [51 Indeed, in this case a(Q) = In 2 by Theorem [TJ Then, 
considering the three inequalities given in Theorem [2l let t\ — and let the input distribution 
P be defined as P(l) = p — 1 — P(0) for some fixed p 6 (0, 1/2). With this choice of t\ and 
P the inequality a holds for any finite a (the infimum is infinite) and inequality c holds for any 
t 2 > 1 since its right hand side is strictly positive. We now focus on the inequality b. Observe 
that any channel V ^ Q with inputs and 1 gives D(PV\\PQ) = +oo. Therefore, for any 
5 E (0, 1) and ti > 1 the right hand side of the inequality b is infinite if Q satisfies 

i£t) < I(PQ) • (5) 

and zero otherwise. Now pick an arbitrarily small fj, > and choose P with p sufficiently close 
to 1 /2 so that 

I{PQ) > a(Q) - fi/2 . (6) 

We conclude from © and © that, by choosing 5 close enough to one and t 2 large enough, any 
asynchronism exponent 

a < ac(Q) — \i 



can be achieved at all rates up to I{PQ)jti- 
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2/ (i) y 2 \ ••>••••, , , ,y (s) 



Fig. 6. Parsing of the received sequence of maximal length A + N — 1 into s blocks y^ , y^ , . . . , yf-"' of length N, where s is the integer 
part of {A + N - 1)/N. 



IV. Analysis 

In this section we prove the converse and the direct part of Theorem [Q The converse shows 
that no coding strategy achieves vanishing error probability while operating at an asynchronism 
exponent higher than a(Q). For the direct part we show that the coding scheme proposed in 
Section UlI-AI can reliably operate arbitrarily closely to the asynchronism exponent a(Q). By 
extending the analysis of this scheme we will prove Theorem [2l The difference between the 
achievability schemes of Theorem Q] and [2] lies in the codebooks. For Theorem Q] the codebook 
is randomly generated according to a certain distribution P, while for Theorem [2] we impose 
that each codeword is (essentially) of constant composition P uniformly over its length. 
Proposition 1 (Converse). Suppose that Q(y\*) > for all y G y. Then no coding strategy 
achieves an asynchronism exponent strictly greater than 

maxD(Q(.|a:)||Q(.|*)) . 

Proposition Q] assumes that Q{y\*) > for all y ey. Indeed, if Q(y\*) = for some y E y it 
will shown in Proposition [2] that reliable communication can be achieved irrespectively of the 
exponential growth rate of the asynchronism level with respect to the blocklength. 

Proof of Proposition Q} Suppose there are two equally likely messages, m and m! , and 
that the decoder is given the sequence of maximal length yi, y 2 , . . . , va+n-i- We make the 
hypothesis that each codeword c(m) and c(m') uses one symbol repeated N times. The case 
where each codeword uses multiple symbols is obtained by a straightforward extension of the 
single symbol case and is therefore omitted. Also, we optimistically assume that the receiver is 
cognizant of the fact that the sent message is delivered during one of the s distinct time slots 
of duration N, where s is the integer part of (A + N — 1)/N, as shown in Fig. [61 An easy 
computation shows that, given a sequence y A+N ~ l , the maximum a posteriori decoder declares 
message m or m! depending whether the sum 



is positive of negative^ with 



l=i 



z(y 



Q{yV\c{m)) Q(y®\c{m')) 



(7) 



Q(y«|*) Q(yW|*) 

and where Q(y^\c(m)) denotes the probability of the Zth block y"> of size N given the codeword 
c(m), and where Q(y^\*) refers to the same probability now conditioned on the string of N 
consecutive *. The probability of the error event £ is hence lower bounded as 



P(£) > ~ 



P m 4Y (l) ) < + P m , J>(Y«) > 



.1=1 



where P m refers to the probability conditioned on message m being sent. Note that under P TO 
and F m , the z(Y®) are all i.i.d. according to the noise distribution except for z(Y^) whose 
distribution depends on the sent message. 



10 If the sum is zero the decoder declares one of the two messages at random. 
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Let T m be the set of sequences y N that are strongly typical with respect to Q(-\c(m)) [|5l 
p.33], i.e, any sequence y N £ T m satisfies \n(y;y N )/N — Q(y\c(m))\ < /x where n(y;y N ) is 
the number of times the symbol y appears in y N . We choose the strong typicality constant /x to 
be so that < /x < 1 and the blocklength N large enough that ¥ m {Y {u) £ T m ) > 1 - /x. We 
define T m i analogously. Further, we define h to be equal to max^w^ur , \z(y N )\. Using the 
independence of and J2i^u z 0^^) un der P m we get 

> (i-/i)p fxx^ w ) < -/» 

The sum in the argument of the last term above involves s — 1 independent random variables 
distributed according to Q (■]*). For simplicity from now on we denote these random variables 
by Zi instead of z(Y®). We then deduce that 

P(£)> (^)v^Zi\>hj . (8) 

In the remaining part of the proof we show that, if s = e( a ^ +e;JV , with e > 0, the random walk 
YltZi Zi crosses h with finite probability as N tends to infinity, proving the Proposition. At the 
core of the argument lies the following Lemma whose proof is deferred to the Appendix. 
Lemma 1. Let P be a distribution over some finite alphabet A = {ai, a 2 , . . . , a^} and suppose 
that for some integer s > 1 

— < min{P(a 1 ),P(a 2 )} 

for some constant 5 £ (0, 1). Let P be an empirical typJ^ over A s so that min{^,fM}> 

5q and P(a2) > 1/s. Let P be defined so that P(di) = P{ai) — f, Pfa) — P{ a 2) + f. and 
P(a,i) = P(ai) for any £ A\{a\, a 2 }. Then 

P S (T{P)) > 5P S {T{P)) 

for some strictly positive constant 5 = 5(5q), where P s denotes the product distribution induced 
by P over A s , and where T(P) and T(P) denote the set of sequences of length s with empirical 
type P and P, respectively. 

We use the lemma with A = {a : a = z(y N ) for some y N £ y N }, s defined as the integer 
part of e N ( a+£ ^ for some arbitrary e > 0, and P defined as P(a) = J2 y N - z (y N )=aQ(y N \~ k ) f° r a ^ 
a £ A. Also, we let a\ = h, a 2 be the symbol in A with the highest probability under P, and 
P be any distribution on A so that |l — S^4| < p» for i £ {1, 2}. In the sequel we label such 

distributions P as 'typical types.' We now assume that s, P, P, a\, and a 2 satisfy the hypothesis 
of Lemma \T\ and will show it at the end of the proof. 

Suppose by contradiction that the right hand side of d8]) goes to zero as iV — > oo, i.e., that 

P'(\i2Zi\<hj >l-p (9) 



"An empirical type over ^l" is a distribution P over A so that P(a) is an integer multiple of 1/s, for all a € A. 
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for any arbitrary p > and N large enough. Assume for the moment that © implies for TV 
large enough 

Z s has a typical type P > 1 - p - p . (10) 

This implication will be shown at the end of the proof. Now, for a given typical type P let P 
be defined as in Lemma [TJ Observe that if Z s belongs to the event 

Zi\ < n |z s has typical type pj 
then Z s has a type P that yields a P whose type class^ belongs to the even 

|li>l>4- 

^ 1=1 ) 

Hence, from Lemma CD and (fTOl) there exists some 8 > so that 

P s {\Yl Zl \ >h ^j >S(l-H-p) (11) 

for iV large enough, which is in contradiction with © for p small enough. We conclude that 
IP (| ^2i=i Zi\ > hj is asymptotically bounded away from zero, and so is the right hand side of 
©.'""" 

To conclude the proof we need to justify the steps from © to (flOl) and we need to check that 
P and P satisfy the hypothesis of the lemma with our choice of a\ and a 2 . For this last check, 
first note that z(y N ) depends only on the type of y N . Without loss of generality we assume that 
h is achieved by a type in T m . Hence we have0 

p(a x ) 4 Q(y N \*) 

> e -ND(Q(.\x)\\Q(-\*))(l+ V ) po ly(AT) 

where x is the times repeated symbol for the codeword c(m), and where rj = i](p) > goes 
to zero as p vanishes. It follows that sP{a\) grows exponentially with iV provided p is small 
enough. Thus the condition \/{s8q) < P(ai) is trivially satisfied for any So E (0, 1). Also, our 
choice of a 2 gives l/(s8 ) < P(a 2 ) for any S . This is because P(a 2 ) > poly(A^) since there 
are polynomially many types of length iV and that a 2 is generated by the type of the highest 
probability. Finally, that the conditions min{ p|°^ , p^ 2 ^ } > an( l ^(02) > V s are satisfied 

follows from the definition of P. 
Finally we show that 

P s {z s has typical type p) 

12 The type class of P is the set of all sequences z a that have type P. 

"This step follows by noting first that 01 = e JVD((Q(1:E)IIQ( ' l * ))(1+o(1)) as fj, -> and N -» oo, and second that o 2 /ai = o(l) as 
TV — > oo (for fi > small enough). 

14 Throughout the paper we use the notation poly(TV) to denote any term that is either a polynomial in TV or the inverse of a polynomial 
in TV. 



can be made arbitrarily close to one as iV tends to infinity, justifying the step from © to (ITOT ). 
Using Chebyshev's inequality and the fact that the variance of a binomial is dominated by its 
expectation we ge 



P( 0l 



1 



1=1 



< 



s/i 2 P(ai) 

which goes to zero as N — > oo since we proved above that sP(ai) grows (exponentially) with 
N. A similar argument shows that P s '(\Pz a (a2) / 'P '(02) — 1| > aO vanishes as iV increases. Since 



< fi, i — 1, 2 











P(Oi) 



P s (^Z s has typical type P 
the claim is proved. 

■ 

The direct part of Theorem \T\ is obtained by a random coding argument associated with the 
scheme presented in Section IIII-Al We assume that all the components of all codewords are 
chosen i.i.d. according to some distribution P to be specified later. Given that message m starts 
being emitted at time I, we bound the probability of error as 



P m ,l(£) < Pm,i( min 7W < I + N - 1) 



V m ,l( r m>l + N) 



with r m as defined in ©, which is interpreted as the sum of the probability of false-alarm and 
the probability of missing the correct codeword. In order to upper bound the above two terms, 
let us define the event E(m, n,i,k) as the intersection of the events 



kI(P ( 



C k (m),Y 7 



n _ l+fc ) + k)I(P ( 

n — i+ 1 



Cfc + l( m ) > Y n-i+k + l 



> t 2 In M 



and iD{P Y £_. J|Q(-|*)) > hlnM. Also let P(m, n, i) = r\k=i,2,...,iE(m,n,i, k). We interpret 
E(m,n,i) as the event that message m is declared at time n by observing the last i symbols. 
With these definitions we haveH 



P m ,;(min 7W < Z + iV- 1) < > P mj/ (^K,n,z)) (12) 

ne[l,...,A+N-l] 
ie[l,...,NAn] 

from the union bound, and 

Pm.ifo™ > i + iV) < P m ,K^(m, / + iV - 1, N) c ) . (13) 

Lemmas [2] and [3] below upper bound the right hand sides of (fT2l and (fT~3T) . 

We denote by V, V x , and V y the set of all distributions on X x y, X, and ^ respectively. 
Later we will also use V y]{X to denote the set of conditional distributions of the form V(y\x) 
with x £ X and y E y. Further we denote by V n the set of all types of length n over X x y, 
and similarly for V x and As mentioned earlier, the notation poly(iV) is used for a term 
that grows no faster than polynomially in N. 



Here i ai (Zi) equals 1 if Zi = ai, zero else. 

The notation a A b is used for the minimum of a and b. 
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Lemma 2 (false-alarm). Assume the codebook to be randomly generated so that each sample of 
each codeword is i.i.d. according to some distribution P. For any threshold constants t±, t 2 G M. 
and asynchronism level A > 1 

P mii (E(m', n, i)) < (M- (tl+fe, - 1 U + M^ 2 ^) poly(JV) . 

m'^m 
n6[l,..., J 4+iV-l] 
ie[l,..-,JVAn] 

Notice that the above bound on the false-alarm error probability does not depend on P. Also 
notice that if t\ + t 2 < 1 or i 2 < 1 the lemma is trivial. 
Proof of Lemma |2l' 

We distinguish the cases when E(m',n,i) is generated outside the message transmission 
period and when it is generated partly outside and partly inside the message transmission period. 
In both cases we will use the identity 

£>(V||PiP 2 ) = I(V) + DiVxWPx) + D(V Y \\P 2 ) , (14) 

where V denotes any distribution on X x y with marginals Vx and Vy, and where Pi and P 2 
are any distributions on X and y respectively. 

Case I: E(m',n,i) is generated outside the message transmission period (i.e., n < I or 
n-i + 1 > l + N) 

By definition E(m',n,i) C E(m' ,n,i,i), hence from Theorem 12.1.4 [4| and (fl~4l) we get 

¥ mil (E(m', n, i)) < F m ^E(m', n, i, i)) 

< e -iD(V\\PQ{-\±)) 

iI(V)>t 2 InM 
iD(Vy\\Q(-\*))>ti InM 

< Y e -U(V)-iD(V Y \\Q(-\*)) 

U(V)>t 2 InM 
iD(V Y \\Q(-\*))>h InM 

< (i + 1) 1*1 W M-* 2 M- 41 

< poly(A^)M-' 2 M-* 1 (15) 

where the last two inequalities hold since \Vi\ < (i + 1)\ X "W by Lemma 2.2 Q and because 

i < N. 

Case II: E(m',n,i) is generated partly outside and partly inside the message transmission 
period (i.e., n > I and n — i + l<l + N — 1) 

Here the event E(m', n, i) involves the output random variables Y n _ i+1 , Y n _ i+2 , . . . , Y n , the first 
k being distributed according to the noise distribution, and the remaining i — k according to the 
distribution induced by the sent codeword. Since, by definition, E(m', n, i) C E(m', n, i, k) for 
any k G [0, 1, . . . , i], a similar computation as for Case I based on the identity (fl~4l) yields 

P m , (E(m', n, i)) < F m j (E(m' } n, i, k)) 

< Y e -fcD(vi||P0(-|*))-(i-fe)D(y 2 ||Piv) 
Viev h ,v 2 eVi- k 

kI{Vi)+(i-k)I{V 2 )>t 2 In M 

< e -*/(vi)-(i-*)j(%) 

kI{Vi)+(i-k)I(V 2 )>t 2 InM 

< po\y(N)M- t2 (16) 
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where P Y (y) = J2 xe x P(x)Q(y\x). 

Combining the cases / and II we get 

P mii (E(m', n, i)) < (M~ ( * 1+ ' 2 " 1) A + M -( * 2_1) ) poly(JV) 

nE[l,...,A+N-l] 
iE[l,-,NAn] 

yielding the desired result. ■ 
Lemma 3 (miss). Assume the codebook to be randomly generated so that each sample of each 
codeword is i.i.d. according to some distribution P. For any threshold constants t\ > and 
*a > 

F m j(E(mJ + N -l,iV) c ) 

< poly(iV)(exp [-N ini y D(V\\Py)] + exp [ — iV min D(V\\PQ)]) 

D(V\\Q(-\*))<t! ha. M/N I{V)<t 2 InM/N 

where Py(y) = XLe* P( x )Q(y\ x )- (The infimum is defined to be equal to +oo whenever the 
set over which it is defined is empty.). 
Proof: The union bound yields 

¥ mtl (E(m,l + N -l,iV) c ) 

< F m>l (ND(P Y i + N-i HQOI*)) < h In M) 

+ J2 P ^ { kI (P C Hm),yr^ + ( N ~ fc )%H,C- l] " h lnM ) • (17) 

fce[i,...,W] 

For the first term on the right hand side of (TT71) we get 

F ml (ND(P Y i + N-i\\Q(-M) < hlnM) < poly(iV) exp \-N inf D(V\\P Y )] 

l L V£V y 

D(V\\Q[-\*))<ti In M/ N 

where Py(y) = XLe* ^(^QG/I^)- To prove the lemma we now show that the second term on 
the right hand side of (fTTI) can be bounded as 

V m>l (kI{P ck{m) Y}+k -,) + (N - k)I{P c * +i{m)yltr ,) < t 2 \nM 

ke[l,...,N] 

< poly(iV) exp [ - N min D(V | \PQ)] . 

I{PV)<t 2 In M/N 

This is done by the following inequalities 

E F ^ ( kI ( p cHm),Yr-^ + ( N - ^(^(hc- 1 ) - h lnM ) 

fee[i,...,iV] 

< ^ e -kD{V\\PQ)~(N-k)D{V\\PQ) 

VdV k ,W&V N _ k 
kI(V)+(N-k)I(V)>t 2 In M 

< poly(iV) exp [ - N min min (5D(V\\PQ) + (1 - S)D(W\\PQ))] 

1 <5e[o,i] (v,w)£S s J 

= poly(iV) exp [ — iV min + ] u D(V\ \PQ)] (18) 
where we defined 



VeV:i(V)<- 9 lnM 



S s = { V, W e V : 5I(V) + (1 - 5)1 (W) > t 2 In M} 
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and where the equality in (fT8l) is justified in Lemma [7] given in the Appendix. ■ 
The following Proposition establishes the direct part of Theorem Q] and will be proved using 
Lemmas [2] and [3j 

Proposition 2 (Achievability). For a channel Q with strictly positive capacity, any asynchronism 
exponent strictly less than 

maxD(Q(.|ar)||Q(-|*)) 

X 

is achievable by a coding strategy that satisfies linijv-^oo hiM/N > 0. 

Proof: Using Lemmas |2] and [3] we get for any A > 1, t± > 0, t% > 1, and distribution P 



P(£) < poly(iV) ^M~ (tl+t2-1) j4 + M 

+ exp[-iV inf D(V\\P Y )] + exp [- N min L>(V||PQ)] 

D(V\\Q(-\*))<ti In M/N I{V)<t 2 In M/N 

(19) 

where P Y (y) = J P(x)Q(?/|x) . We focus on the four terms inside the large brackets of 
the above expression. For now we assume that Q(y\*) > for all y E y, implying that 
D(P Y \\Q (•]*)) < oo for any input distribution P. The case where Q(y\*) = for some y G y 
is considered at the end of the proof. 

Pick an input distribution P so that I(PQ) > and D(P Y \ |Q(-|*)) > (this is possible 
since C(Q) > 0), fix t 2 > 1, and let [i > be a small constant (later we will take t 2 — > oo and 
— > 0). Then choosing the ratio In M/N > and the constant ti > so that 

t2lnM =/(Pg)- / ,/2 (20) 



-(ta-1) 



iV 
and 

tilnM 



D(iV||g(-W)-/i/2, (21) 



the second, third, and fourth term inside the large brackets in (fT9l decay exponentially with iV. 
Now for the first term. From d20l) and ((2T|) we get 

ti + t 2 = ^(£>(PYlig(-W)+/(pg)-/i) . (22) 

For the first term to go to zero exponentially with iV we further choose A = M tl+t2 ^^ l+ ^\ or, 
equivalently using d20l) and (|22|) 

A = e N(D{P Y \\Q{-\*))+I(PQ)-^- l -^{l+ri) 

_ e Jv(D(P r ||Q(.[*))+/(PQ)- M -itfi(/(PQ)- At /2)) ^ 

Since // can be made arbitrarily small and t 2 arbitrarily large we conclude from (|23T) that, as 
long as A = e Na with 

a< D{P Y \\Q{-\*))+I{PQ) (24) 

the right hand side of (fT9l) goes to zero as N tends to infinity. Maximizing the right hand side 
of (|24l) over the input distributions P gives D(Q(-|a;)||Q(-|*)), yielding the desired result. To 
prove this we show tha{3 

sup {D{P Y \\Q{-\*)) + I{PQ)) = max D{Q{-\x)\\Q{-\*)) . (25) 

p X 

D(iY||Q(-|*))>0 
I(PQ)>0 

17 The domain over which the supremum is taken is nonempty since C(Q) > 0. 
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Since we assumed that Q{y\*) > for all y E y, we have that D(P Y \\Q(-\*)) + I{PQ) is 
continuous in P and therefore 

sup (D(P Y \\Q(-\*)) + I(PQ)) = max(D{P Y \\Q(-\*)) + I(PQ)) . 
p p 

D(iV|[Q(-[*))>o 

I(PQ)>0 

Rewriting D(P Y \\Q(-\*)) + I(PQ) we get 

D(P Y \\Q(-\*)) + I(PQ) = J2nx)D(Q(-\x)\\Q(-\*)), 

x 

hence 

sup {D{P Y \\Q{-\*)) + I{PQ)) = maxD{Q(Y\x)\\Q{-\*)) . 

£>(iV||Q(-|*))>0 
/(PQ)>0 

We now focus on the case where Q(y\*) = for some y E y. Pick an input distribution P 
such that I(PQ) > and D(Py||Q(-|*)) = oo — one possibility is to take P as the uniform 
distribution over X. Again consider the four terms into large brackets in (fT9l) . Fix i 2 > 1 an d 
fix the ratio In M/iV so that < i^M. < I(PQ). It follows that the second and fourth term 
decay exponentially with N. Now, with our choice of input distribution note that the third term 
decays exponentially with N, irrespectively of how large t 1 is. By letting A = M* 1 it follows 
that the four terms decay exponentially with N, irrespectively of the exponential growth rate 
of A with respect to N. Hence, when Q(y\*) = for some y E y, an asynchronism exponent 
arbitrary large can be achieved. 

(Note that above we always assumed In M/N to be some strictly positive constant. Therefore 
the second part of the claim of the proposition follows.) ■ 

To prove Theorem |2] we consider the same random coding argument used in proving Propo- 
sition |2l except that we modify the random codebook ensemble so that each codeword now 
satisfies a certain prefix condition. This condition will allow us to treat the codewords as being 
essentially of constant composition (see, e.g.,|[51 p.l 17]) uniformly over their length, yielding 
an improved error probability exponent compared to the case where the codewords are i.i.d. P. 

The random construction of a codebook satisfying the prefix condition is obtained as follows. 
Given a message m, the codeword c N (m) is generated so that all of its symbols are i.i.d. 
according to a distribution P. If the obtained codeword does not satisfy the prefix condition 
we discard it and regenerate a new codeword until the prefix condition is satisfied. The prefix 
condition requires that all prefixes c l (m) of size i greater than N/lnN have empirical type 
P c »( m ) close to P, in the sense that \\P — -P c »( m )|| < l/lniV0 If iV is large enough, with 
overwhelming probability a random codeword will satisfy the prefix condition. Indeed, by the 
union bound, the probability of generating a sequence c N (m) that does not satisfy the prefix 
condition is upper bounded by iVexp [— 9 (iV/(miV) 3 ) |], which tends to zero as N tends to 
infinity. This proves the following lemma. 

Lemma 4. The probability that a sequence C%, C2, ■ ■ ■ , Cjv of random variables i.i.d. according 
to P does not satisfy the prefix condition tends to zero as N goes to infinity. 
To prove Theorem [2] we will need Lemmas [5] and [6] that bound the probabilities of false-alarm 
and miss assuming the codewords satisfy the prefix condition. Before establishing these lemmas 

18 Here || ■ || is the L\ norm. Also, the choice N/lnN for the minimum prefix size could be replaced by any function f(N) so that 
f(N) = o(N) while ]nN/f(N) = o(l). 
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we make a small digression on the growth rate of M and N. Referring to the achievability 
scheme of Section IIII-Al decoding may happen only if i is so that the condition 



mm 

fee[i,...,i 



kI(P ck(m ^ Y n-,+k) + — fc)/(P C £ +i(m)j y 



> t 2 In M 



is satisfied. Thus, a lower bound on the values of i for which decoding may happen is In M/ln \X\ 
since /(■) < la\X\ and t 2 > 1 . In order guarantee that, whenever decoding happens, only 
codeword prefixes of size larger than N/ hi iV — the size of the smallest constant composition 
prefix — are involved we impose that M and N satisfy 

N InM 

< r^TJ • (26) 



lniV ~ In \X\ 

Lemma 5 (false-alarm, with prefix condition). Assume the codebook to be randomly generated 
so that each codeword satisfies the prefix condition according to P, and assume that (|26|) holds. 
For any threshold constants ti, t 2 £ K and any asynchronism level A > 1 



¥ mj i (E(mf, n, i)) < po\y(N)(M~ {tl+t2 - 1+o{1)) A + M- ( * 2 ~ 1+o(1)) ) 



ne[l,...,A+N-l] 
ie[l,...,NAn] 

as N — > oo. 

Lemma 6 (miss, with prefix condition). Assume the codebook to be randomly generated so that 
each codeword satisfies the prefix condition according to P and assume that (|26l) holds. For 
any t± > and t 2 > 

F mtl (E(m,l + N-l,NY) 

< poly(iV) ( exp T — inf D((PV) Y \\Py)(l + o(l))l 

V y£py\x 

D{(PV) Y \\Q(-\*))<ti InM/N 

+ exp\~N min D(PV\\PQ)(1 + o(l))]) (27) 

I(PV)<t 2 In M/N 

as N — > oo, where Py(y) = J2 x <ax P ( x )Q(v\ x )- 

Comparing Lemma[2]with Lemma[5]and Lemma[3]with Lemma[6]we see that the false-alarm 
probability bounds are essentially the same with and without the prefix condition, whereas for 
the miss probability the bound is improved by the prefix condition. Note also that, for the miss 
probability, the bound obtained with the prefix condition is the sum of two terms that involve 
convex optimizations, whereas the bound without the prefix condition involves a non convex 
optimization, in general more difficult to handle. To prove Lemmas [5] and [6] we use similar 
bounding techniques as in the proofs of Lemmas [2] and [3] together with the following argument. 

Suppose {(Cj, 5^)}i=i,..., n is a sequence of i.i.d. pairs of random variables taking values in 
X x y so that (Ci, Yi) is distributed according to some J eV.lt then follows, by Theorem 01 
Theorem 12.1.4], that for a given type V = V x V Y \x in V n 

F{{C n ,Y n ) has type V) < e ~ nD( - VxV ^ J ^ , (28) 

which implies that 

P((C m ,F n ) has type V \ C n satisfies prefix condition) F(C n satisfies prefix condition) 

< e -nD(V x V Y]x \\J) _ (29) 
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Now assuming that n is larger than N/ In N, the size of the smallest codeword length that 
satisfies the prefix condition, we have that 

P((C n , Y n ) has type V \ C n satisfies the prefix condition) 

has nonzero probability only if \\Vx — P\\ < 1/lniV. Assuming so, since the probability that 
C n satisfies the prefix condition tends to one as n — > oo (Lemma S]) we conclude from (T29l) and 
by continuity of D(-\\J) that 

F{{C n ,Y n ) has type V | C n satisfies prefix condition)<e"" D(PVV i* l|J)(1+o(1)) (30) 

as iV — > oo. 

Comparing (|28l ) and (1301) we see that the prefix condition essentially allows us to treat C n as 
being of composition P. Accordingly, to prove Lemmas [5] and [6] we follow the steps of the proofs 
of Lemmas [2] and [3] and repeatedly use the above argument (without explicitly mentioning it 
everywhere) in order to incorporate the prefix condition and change the large deviations exponent 
of the form D(VxVy\x\ \J) to D(PVy\x\\J)- The only additional technicality relates to the small 
discrepancy that occurs because the prefix condition does not hold for small prefix lengths, i.e., 
lengths smaller than N/ lniV. We recall that M and iV are assumed to satisfy (T26l) . 
Proof of Lemma \5\~ 

Case I: E(m',n,i) is generated outside the message transmission period (i.e., n < I or 
n-i + 1 > l + N) 

A similar computation as in (fT51) yields as N — » oo 

¥ m>l (E(m', n, i)) < ¥ m>l {E(m', n, i, i)) 

< e -iD{V\\PQ{-\*))(l+o(l)) 

U(y)>t2 InM 
iD(V Y \\Q(-\*))>tilnM 

< e -i(I(y)+D(V Y \\Q(-\*))(l+o(l)) 

U(V)>t 2 InM 
iD(V Y \\Q(-\*))>hlnM 

< poly(iV)M-* 2 ~* 1+o(1) . 

where V x w P denotes \\V X - P\\ < 1/ln N. 

Case II: E(m',n,i) is generated partly outside and partly inside the message transmission 
period (i.e., n > I and n — i + l<l + N — I) 

The event E(m',n,i) involves the output random variables Y n _ i+ i, Y n ^ i+2 , ■ ■ ■ ,Y n , the first k 
being distributed according to the noise distribution, and the remaining i — k according to the 
distribution induced by the sent codeword. In order to deal with the discrepancy that results 
because codeword lengths of size smaller than N/ In iV do not satisfy the prefix condition, we 
distinguish two cases. 
. k > N/ In iV and i - k > N/ In N 
A similar computation as in (fT6l) yields 

_-(fcD(Vi||FQ(.|*))+(i-fc)O(y 2 ||FP r ))(l+ (l)) 



F m>l (E(m',n,i))< ^ 



V x =P~s,Wx~P±£ 
kI(V)+(i-k)I(W)>t 2 InM 



< 



E 



-(kI{Vi)+(n-i)I(V 2 )){l+o(l)) 



fc/(Vi)+(i-fe)7(V r 2 )>t2 InM 

< poly(iV)M~ a+o(1) 



20 



where Py(-) = E* 6 * P{x)Q(-\x). 
k<N/\nNori — k<N/\nN 

We consider only the case k < N/ In N, the case i — k < NJ In N being obtained in the 
same way. Since I(V) < In \X\ we have as N — > oo 



P m ,,(£(m>,z)) < E e" 



■(i-fc)(£>(V||PJV)(l+o(l)) 



(AT/ In AT) ln|A , |+(i-fc)/(V)>t 2 InM 

< ^ e -(i-k)I(V)(l+o(l)) 

vePi- k ,Vx&P 

(N/ In AT) In | A , |+(j-fc)/(V)>t 2 In Af 

< poly(A^)M-' 2+o(1) . 
Combining the cases I and II we get as A" — > oo 

53 P m , ; (E(m', n, i)) < (M -(t2- * 1-1+o(1) ^ + M~(* 2 ~ 1+o(1)) ) poly(iV) 

ne[l,...,A+N-l] 
i6[l,...,ATAn] 

yielding the desired result. ■ 
Proof of Lemma |6l" According to the proof of Lemma [3] we need to bound 

P m , z (A r J D(P y /+^- 1 ll ( 5(-W) < hlnM) 

and 

E P -.< (^(^M.y/^-O + ( N ~ k)I(P c » +i{m)t yi+»-i)<t 2 ]nM) . 
ke[i,...,N] 

For the first term we apply the argument that precedes Lemma [5] and immediately obtain 

F m>l (ND(P Y i +N -i\\Q(-\*)) < tjlnM) 

< poly(AT) exp [ — N inf D((PV) y \\Py)(1 + o(l))l (31) 

D((PV>||Q(-|*))<tilnAf/iV 



as A" — > oo. For the second term we proceed along the lines of the set of inequalities (1181) 
and, similarly to the case II in the proof of Lemma [51 we separately consider the situations 
k < N/lnN and k > N/ In N. This yields 

fee[l,...,Ar] 

< poly(AT) exp [ — A" min D(PV\\PQ)(1 + o(l))l 

J(PV)<i 2 In M/N 

as A^ — > oo, which concludes the proof. ■ 
Proof of Theorem [2]- The proof is obtained by deriving bounds on the average decoding 
delay (r — u) + and on the error probability event £. In what follows we assume that the ratio 
In M/N remains fixed as A" — > oo so that (l26l) is satisfied. This in turn allow us to use Lemmas 
[5]and[6l Also, from now on we assume that P is so that I(PQ) > 0. 
The average decoding delay is bounded as 

E m ,i(T-/) + <E m ,i(r m -/) + 

= E mj i(t Tm< i +N (r m - /)+) + E m:l (l Tm > l+N (T m - l)+) (32) 

where t Tm >i + w is equal one if r m > I + N, zero else. 
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For the first term on the right hand side of (|32l ) we have 

E m ,i(ir m </+jv(r m - l) + ) <j + NF m>l (r m > I + j) , (33) 

whereM 



with 



. A t 2 lnM(l + l/M) 

J = 7(PQ) ■ 

= /(Pg) 77^ (34) 

D(PV||PQ)<5 

and 5 = 6(M) = 1/VkM. For now we assume that 

J = jfpQji 1 + °(!)) asiV^oo (35) 

and show that the term iVP m i (r m > I + j) goes to zero as iV tends to infinity — the equality 
(l35l) will be shown at the end of the proof. Using the inequality (|27T) with iV replaced by j 
yields 

F m , ; (r m > Z + j) < P m , ; (E{m,l + j - l,j) c ) 

< poly(iV) f exp [ - j min D(PV\\PQ)(1 + o(l))l 

J(PV)<t 2 InM/j 

+ exp [ - j ^mf D((Pl/) y ||P r )(l + 

D((Py)Y-[|Q(-|*))<ti InM/j 

(36) 

We evaluate the first term in the large brackets in (l36l) . Expanding d(8) in the definition of j 
we get 

t 2 lnM(l + 1/M) 



mm 



j yeP^i^:D(py||PQ)<5 
implying thajl] 



J(PV) (37) 



min P>(PV||PQ) > 5 . 

veyy\ x :i(PV)< t2 ^ M 

Since 5=1/ vmM we obtain 

exp[-j min D(PV\\PQ)] < e - @{V ^ B) . (38) 

VePy|x 
I{PV)< t2 lnM 



We now turn to the second term in the large brackets in (l36l) . Since j = %p^y(l + o(l)), we 
assume that P, ^ > 0, and i 2 > 1 satisfy 

^ t 2 P(iV||Q(-|*)) 
1 J(PQ) ( } 

"The term 1/M in the definition of j can be replaced by any positive strictly decreasing function of M. 

20 Here we are using the fact that if for some e > we have njio !e : 9 (ar)<c f{ x ) ~ m + e, then vain w: f( a: )< m g(x) > c. 
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so that 

inf D((PV)y\\Py) > , 

D((PV>||Q(-|*))<ti InM/j 



and hence 



exp[-j inf D((Py)y||iV)l < e- 0(lnM) . (40) 

D((PV) Y \\Q(-\*))<tilnM/j 



From d36J), 08]), and gOJ we have 

iVP m ,,(r m > Z + j) -»• as N -> oo , 
and using (|33l) and (1351) it follows that 



E mi Ki rro < i+iV (T m - /)+) < ^^(1 + o(l)) . (41) 



For the second term on the right hand side of the equality in (1321) we get 

E m , i (a Tm > /+JV (r m - /)+) < (A + N)F mtl (r m > I + N) 
since r m < A + N — 1. Further, using Lemma [6] 

Pm,i(fm > / + iV) < P m ,i(£(m, Z + iV - 1, N) C ) 

< poly(iV)f exp \-N inf D((PF) y | \P Y ){1 + o(l))l 

D((PV)y\\Q(-\*))<ti In M/N 

+ exp[-iV min D{PV\\PQ){1 + o(l))l ) , 

J(PV)<t 2 In M/N 

and thus 

< poly(A^)y4fexp [ — iV inf D((PV) Y \\P Y )(1 + o(l))l 

D((Py)y||Q(-|*))<i 1 InM/jV 

+ exp[-iV min L>(PV||PQ)(1 + o(l))l V (42) 
Letting A = e Wa with a > we have 



I(PV)<t 2 In M/N 



E m ,z(l Tm >z +i v(r m - Z) + ) = o(l) asiV^oo 
provided that P, ti > 0, t 2 > 1> an d the ratio hi M/N can be chosen so that the inequalities 

a< inf D((PV)y\\(PQ) Y ) 

y<z.-py\x 
D((PV) Y \\Q(-\*))<ti In M/N 

a< min D(PV\\PQ) (43) 

are satisfied. Therefore, if the inequalities from (|39l) and (|43T) are satisfied the delay is bounded 
as 

E m ,(r m -/) + < ^^(l + o(l)). (44) 
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We now bound the error probability. To that aim we consider the false-alarm and miss events 
and obtain, by Lemmas [5] and [6] 

P(£) < poly(iV) (M-^+^-W+^A + M-to-W+W 

+ exp[-iV min D{{PV) y \\{PQ)y){1 + o(l))l 

D((PV) Y \\Q(-\*))<ti \nM/N 

+ exp\-N inf D{PV\\PQ){l + o{l))} ) . (45) 

I(PV)<t 2 lnM/N / 



Therefore, if in addition to the three inequalities given in (|39l) and (|43l) we impose that the ratio 
h\M/N satisfies 

InM a 

> 



N ~ 5{h + h-l) 

for some 5 E (0, 1), the right hand side of (|45T) goes to zero as iV tends to infinity, and using 
(l44l) we deduce that the asynchronism exponent a can be achieved at rate I(PQ)/t 2 . 

To summarize, if P, t\ > 0, t 2 > 1, a, and the ratio h\M/N satisfy the following conditions 

a. a< inf D((PV) Y \\(PQ) Y ) 

vepy\ x 

Dapv)Y\m\*))< S(tl X-D 

b. a< min D(PV\\PQ) 
h D{{PQ) Y \\Q{-\*)) 

h I(PQ) { } 

-TT^ W+tTT) (47) 

for some S E (0, 1), then the asynchronism exponent a can be achieved at rate I(PQ)/t 2 . Note 
that if the conditions a, b, and c are satisfied for some a, P, t\ > 0, t 2 > 1, and § E (0, 1) one 
can always find choose N/ In M so that the condition d is satisfied. Hence, if the conditions a, 
b, and c are satisfied for some a, P, ti > 0, t 2 > 1, and 5 e (0, 1) the asynchronism exponent 
a can be achieved at rate I(PQ)/t 2 . 

To conclude the proof we show that j = ^frpM (1 + To that aim we show that d(S) = 

1 + o(l) as 5 — > 0. Since I(PV) is a continuous function over the compact set 

{V E V y \ x : D(PV\\PQ) < 8} , (48) 

the minimum in the denominator of the right hand side of (|34l is well defined, and so is d(5). 
We now show that for 5 small enough, the set in (|48l) contains no trivial conditional probability 
V, that is no V E T y \ x such that V(-|a;) is the same for all x E X. This will imply that 
d{5) = 1 + o(l) as 8 -> 0. 

Let = Wx(x)W / y(y) for all (x,y) E X x y. The identity (fill) yields 

D(PQ\\W) = I(PQ) + D(P||Wx) + D(P y \\Wy) 

> I(PQ) (49) 
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where Py(y) — J2 x ex P( x )Q(y\ x )- Since the set V w of product measures in V is compact and 
D(PQ\\ ■ ) is continuous over V n , from (|49l) we have 

min D(PQ\\W) > I(PQ) . (50) 

wep* 

Since I{PQ) > 0, from (|50l) one deduces that irdnw e pir £)(W||PQ) i s strictly positive^] and 
therefore the set (l48l) contains no trivial conditional probability. Therefore, for 5 small enough 
the denominator in the definition (|34l ) is strictly positive, implying that d(S) is finite. We then 
deduce that d(6) = 1 + o(l) as 5 -> 0. ■ 

V. Concluding remarks 

We introduced a new model for asynchronous and sparse communication and derived scaling 
laws between asynchronism level and blocklength for reliable and quick decoding. Perhaps the 
main conclusion is that even in the regime of strong asynchronism, i.e., when the asynchronism 
level is exponential with respect to the codeword length, reliable and quick decoding can be 
achieved. 

At this point several directions might be pursued. Perhaps the first is the characterization of 
the asynchronism exponent function a(-, Q) at positive rates. In order to make this problem easier 
one may want to consider a less stringent rate definition. Indeed, the definition of rate we adopted 
considers E(t— u) + as delay. As a consequence, in the exponential asynchronism level we mostly 
focused on, it is difficult to guarantee high communication rate; even though the probability of 
'missing the codeword' is exponentially small in the codeword length, once the codeword is 
missed we pay a huge penalty in terms of delay, of the order of the asynchronism level which 
is exponentially large in the codeword length. Therefore, instead of imposing E(r — u) + to be 
bounded by some d, we may consider a delay constraint of the form P((r — u) + < d) m 1 and 
define the rate as In Mj d. 

Another direction is the extension of the proposed model to include the event when no 
message is sent; the receiver knows that with probability 1 — p one message is sent and 
with probability p no message is sent. For this setting 'natural' scalings between p and the 
asynchronism level remain to be discovered. 

Finally a word about feedback. We omitted feedback in our study in order to avoid a potential 
additional source of asynchronism. Nevertheless since feedback is inherently available in any 
communication system it is of interest to include, say, a one-bit perfect feedback from the 
receiver to the transmitter. In this case variable length codes can be used and the asynchronism 
level might be defined directly with respect to E(r — v) + instead of the blocklength. 

VI. Appendix 

Proof of Lemma [7} The binomial expansion for P S (T(P)) (see, e.g., [4, equation 12.25]) 
gives 



PS{T{P)) [sPM, sP(a 2 ), . . . , sP(a lAl )) P[ 

Using the hypothesis on P, P, and P gives P(aj) > 3/s, i G {1,2}, hence 
P S (T(P)) fP(a 2 )\ 3 (sP(a 1 )-2)(sP(a 1 ) - l)(sP(oi)) 



P S (T(P)) \P(ai)J (sP(a 2 ) + l)(sP(a 2 )+2)(sP(a 2 )+3) 

= fP(a 2 )\ 3 ( P(ai)Y (1 - - 2/sP(a 1 )) 

' VP(oi)/ \P{a 2 )J {l + l/sP{a 2 )){l + 2/sP{a 2 )){l + 3/sP{a 2 )) 



> S 



We use the fact that D(P 1 \\P 2 ) = if and only if Pi = P 2 . 
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for some 5 = S(S ) > 0. ■ 
Lemma 7. For any distribution J on X x y and any constant r > 

min min hD(VA I J) + (1 - t^DCV^ I J) = min D(V\\ J) . 

tie[o,i] Vi,v a e7» i v hi ; v i; v 211 > VeV v 11 > 

tiI(Vi)+(l-ti)I{V 2 )<r I(V)<r 

Proof: If r > /(J) the claim trivially holds, since the left and right hand side of the above 
equation equal to zero. From now on we assume that r < I (J). 
Define 

a= min min hDiVA I J) + (1 - h)D(V 2 \ \J) 

*ie[o,i] Vi,y 2 ep 

ti/W)+(l-ii)/(V2)<r 
I(Vi)=I(V 2 ) 

and 

6= min mf t x D{V x \\J) + {I - t x )D{V 2 \\J) . 

tiG[o,i Vi.v^e-p 

ti/(Vi)+(l-t 1 )/(F 2 )<r 
/(H)>J(y 2 ) 

Since a = min vev D(V\\J) to prove the Lemma it suffices to show that b > min vev D(V\\J). 

I(V)<r I(V)<r 

This is done via the following two claims proved below: 

• claim i. mm V j(v)<r D(V\ \J) = mm v .j( V ) =r D(V\\J). 

• claim ii. the function /(r) = mm V: i(v)=r D(V\ \ J) is convex. 
Using the above claims we have 

b = mf /(n + / r 2 

ri>r 2 ri — To ri — To 

r — rn , ri — r 

— —~- T2 H ^ — ri =r 
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>/(0 

and therefore 6 > min vep D{V\\J). 

I(V)<r 

The proof of the above claims is based on the convexity of _D(Ji|| J 2 ) in the pair (J 1; J 2 ) 
(see, e.g., Lemma 3.5, p. 50]). For claim i, let r > and suppose that I(V) < r0 By defining 

V = XV + (1 - A) J with A G [0, 1) we have D(V\\J) < D(V\\J) by convexity. On the other 
hand letting Vx and Vy denote the left and right marginals of V we have we have 

I(V) = D(XV + (1 - X)J\\\V x Vy + (1 - X)J X J Y ) 
= XD(V\\V X V Y ) + (1 - X)D(J\\ J X J Y ) 

— xiiy) + (1 — x)i(j) 

< r 

where the inequality holds for A sufficiently close to one. Therefore V strictly improves upon 

V and claim i follows H 

For claim ii, let V\ and V 2 achieve f(ri) and f(r 2 ), for some r x r 2 , and let V = XV X + 
(1 — A)V 2 . By convexity we have 

D{V\\J) < XDiV^J) + (1 - X)D(V 2 \\J) 
= A/(n) + (1 - A)/(r 2 ) 

and /(V) < r. This yields claim ii. ■ 

If r = the claim holds trivially. 
Notice that in (5] p. 169] a similar argument holds for the sphere packing exponent. 
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